The Datasets module

Calculation datasets.

This module deals with the handling of series of calculations. Classes and functions of this module are meant to simplify the approach to ensemble calculations with the code, and to deal with parallel executions of multiple instances of the code.

class Dataset(label='BigDFT dataset', run_dir='runs', **kwargs)[source]

A set of calculations.

Such class contains the various instances of a set of calculations with the code. The different calculations are labelled by parameter values and information that might then be retrieved for inspection and plotting.

Parameters
  • label (str) – The label of the dataset. It will be needed to identify the instance for example in plot titles or in the running directory.

  • run_dir (str) – path of the directory where the runs will be performed.

  • input (dict) – Inputfile to be used for the runs as default, can be overridden by the specific inputs of the run

append_run(id, runner, **kwargs)[source]

Add a run into the dataset.

Append to the list of runs to be performed the corresponding runner and

the arguments which are associated to it.

Parameters
  • id (dict) – the id of the run, useful to identify the run in the dataset. It has to be a dictionary as it may contain different keyword. For example a run might be classified as id = {'hgrid':0.35, 'crmult': 5}.

  • runner (Runner) – the runner class to which the remaining keyword arguments will be passed at the input.

Raises

ValueError – if the provided id is identical to another previously appended run.

Todo

include id in the runs specification

calculators

Calculators which will be used by the run method, useful to gather the inputs in the case of a multiple run.

fetch_results(id=None, attribute=None, run_if_not_present=True)[source]

Retrieve some attribute from some of the results.

Selects out of the results the objects which have in their id at least the dictionary specified as input. May return an attribute of each result if needed.

Parameters
  • id (dict) – dictionary of the retrieved id. Return a list of the runs that have the id argument inside the provided id in the order provided by append_run(). If absent, then the entire list of runs is returned.

  • attribute (str) – if present, provide the attribute of each of the results instead of the result object

  • run_if_not_present (bool) – If the run has not yet been performed in the dataset then perform it.

Example

>>> study=Dataset()
>>> study.append_run(id={'cr': 3}, input={'dft':{'rmult':[3,8]}})
>>> study.append_run(id={'cr': 4}, input={'dft':{'rmult':[4,8]}})
>>> study.append_run(id={'cr': 3, 'h': 0.5},
>>>                  input={'dft':{'hgrids': 0.5, 'rmult':[4,8]}})
>>> #append other runs if needed
>>> #run the calculations (optional if run_if_not_present=True)
>>> study.run()
>>> # returns a list of the energies of first and the third result
>>> # in this example
>>> data=study.fetch_results(id={'cr': 3}, attribute='energy')
get_global_option(key)

Get one key in global options. The key must exist. Useful to force implementation of compulsory options when subclassing.

Parameters

key (string) – the global option key

Returns

The value of the global options labelled by key

global_options()

Get all global options dict.

Returns

The dictionary of the global options in its current status

Return type

dict

ids

List of run ids, to be used in order to classify and fetch the results

names

List of run names, needed for distinguishing the logfiles and input files. Each name should be unique to correctly identify a run.

pop_global_option(key)

Remove a given global option from the global option dictionary

Parameters

key (string) – the global option key

Returns

The value of the global option

post_processing(**kwargs)[source]

Calls the Dataset function with the results of the runs as arguments

pre_processing()

Pre-treat the keyword arguments and the options, if needed.

Returns

dictionary of the pre-treated keyword arguments that have to be actually considered by process_run.

Return type

dict

process_run()[source]
Run the dataset, by performing explicit run of each of the item of the

runs_list.

results

Set of the results of each of the runs. The set is not ordered as the runs may be executed asynchronously.

run(**kwargs)

Run method of the class. It performs the following actions:

Developers are therefore expected to override pre_processing() process_run() and post_processing(), when subclassing Runner.

runs

List of the runs which have to be treated by the dataset these runs contain the input parameter to be passed to the various runners.

seek_convergence(rtol=1e-05, atol=1e-08, selection=None, **kwargs)[source]

Search for the first result of the dataset which matches the provided tolerance parameter. The results are in dataset order (provided by the append_run() method) if selection is not specified. Employs the numpy allclose() method for comparison.

Parameters
  • rtol (float) – relative tolerance parameter

  • atol (float) – absolute tolerance parameter

  • selection (list) – list of the id of the runs in which to perform the convergence search. Each id should be unique in the dataset.

  • **kwargs – arguments to be passed to the fetch_results() method.

Returns

the id of the last run which matches the

convergence, together with the result, if convergence is reached.

Return type

id,result (tuple)

Raises

LookupError – if the parameter for convergence were not found. The dataset has to be enriched or the convergence parameters loosened.

set_postprocessing_function(func)[source]

Set the callback of run.

Calls the function func after having performed the appended runs.

Parameters

func (func) – function that process the inputs results and returns the value of the run method of the dataset. The function is called as func(self).

update_global_options(**kwargs)

Update the global options by providing keyword arguments.

Parameters

**kwargs – arguments to be updated in the global options

class Dataset(label='BigDFT dataset', run_dir='runs', **kwargs)[source]

Bases: BigDFT.Calculators.Runner

A set of calculations.

Such class contains the various instances of a set of calculations with the code. The different calculations are labelled by parameter values and information that might then be retrieved for inspection and plotting.

Parameters
  • label (str) – The label of the dataset. It will be needed to identify the instance for example in plot titles or in the running directory.

  • run_dir (str) – path of the directory where the runs will be performed.

  • input (dict) – Inputfile to be used for the runs as default, can be overridden by the specific inputs of the run

append_run(id, runner, **kwargs)[source]

Add a run into the dataset.

Append to the list of runs to be performed the corresponding runner and

the arguments which are associated to it.

Parameters
  • id (dict) – the id of the run, useful to identify the run in the dataset. It has to be a dictionary as it may contain different keyword. For example a run might be classified as id = {'hgrid':0.35, 'crmult': 5}.

  • runner (Runner) – the runner class to which the remaining keyword arguments will be passed at the input.

Raises

ValueError – if the provided id is identical to another previously appended run.

Todo

include id in the runs specification

calculators

Calculators which will be used by the run method, useful to gather the inputs in the case of a multiple run.

fetch_results(id=None, attribute=None, run_if_not_present=True)[source]

Retrieve some attribute from some of the results.

Selects out of the results the objects which have in their id at least the dictionary specified as input. May return an attribute of each result if needed.

Parameters
  • id (dict) – dictionary of the retrieved id. Return a list of the runs that have the id argument inside the provided id in the order provided by append_run(). If absent, then the entire list of runs is returned.

  • attribute (str) – if present, provide the attribute of each of the results instead of the result object

  • run_if_not_present (bool) – If the run has not yet been performed in the dataset then perform it.

Example

>>> study=Dataset()
>>> study.append_run(id={'cr': 3}, input={'dft':{'rmult':[3,8]}})
>>> study.append_run(id={'cr': 4}, input={'dft':{'rmult':[4,8]}})
>>> study.append_run(id={'cr': 3, 'h': 0.5},
>>>                  input={'dft':{'hgrids': 0.5, 'rmult':[4,8]}})
>>> #append other runs if needed
>>> #run the calculations (optional if run_if_not_present=True)
>>> study.run()
>>> # returns a list of the energies of first and the third result
>>> # in this example
>>> data=study.fetch_results(id={'cr': 3}, attribute='energy')
get_logfiles()[source]
get_times()[source]
ids

List of run ids, to be used in order to classify and fetch the results

names

List of run names, needed for distinguishing the logfiles and input files. Each name should be unique to correctly identify a run.

post_processing(**kwargs)[source]

Calls the Dataset function with the results of the runs as arguments

process_run()[source]
Run the dataset, by performing explicit run of each of the item of the

runs_list.

results

Set of the results of each of the runs. The set is not ordered as the runs may be executed asynchronously.

runs

List of the runs which have to be treated by the dataset these runs contain the input parameter to be passed to the various runners.

seek_convergence(rtol=1e-05, atol=1e-08, selection=None, **kwargs)[source]

Search for the first result of the dataset which matches the provided tolerance parameter. The results are in dataset order (provided by the append_run() method) if selection is not specified. Employs the numpy allclose() method for comparison.

Parameters
  • rtol (float) – relative tolerance parameter

  • atol (float) – absolute tolerance parameter

  • selection (list) – list of the id of the runs in which to perform the convergence search. Each id should be unique in the dataset.

  • **kwargs – arguments to be passed to the fetch_results() method.

Returns

the id of the last run which matches the

convergence, together with the result, if convergence is reached.

Return type

id,result (tuple)

Raises

LookupError – if the parameter for convergence were not found. The dataset has to be enriched or the convergence parameters loosened.

set_postprocessing_function(func)[source]

Set the callback of run.

Calls the function func after having performed the appended runs.

Parameters

func (func) – function that process the inputs results and returns the value of the run method of the dataset. The function is called as func(self).

wait()[source]
combine_datasets(*args)[source]

Define a new instance of the dataset class that should provide as a result a list of the runs of the datasets

name_from_id(id)[source]

Hash the id into a run name Construct the name of the run from the id dictionary

Parameters

id (dict) – id associated to the run

Returns

name of the run associated to the dictionary id

Return type

str

names_from_id(id)[source]

Hash the id into a list of run names to search with the function id_in_names and add the separator ‘,’ to have the proper value of a key (to avoid 0.3 in 0.39)